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ABSTRACT 

The use of statistical techniques to build approximations of expensive computer analysis codes 
pervades much of today’s engineering design. These statistical approximations, or metamodels, 
are used to replace the actual expensive computer analyses, facilitating multidisciplinary, 
multiobjective optimization and concept exploration. In this paper we review several of these 
techniques including design of experiments, response surface methodology, Taguchi methods, 
neural networks, inductive learning, and kriging. We survey their existing application in 
engineering design and then address the dangers of applying traditional statistical techniques to 
approximate deterministic computer analysis codes. We conclude with recommendations for the 
appropriate use of statistical approximation techniques in given situations and how common pitfalls 
can be avoided. 

1 INTRODUCTION 

Much of today's engineering analysis consists of running complex computer codes: supplying 
a vector of design variables (inputs) x and computing a vector of responses (outputs) y . Despite 
steady advances in computing power, the expense of running many analysis codes remains non- 
trivial; single evaluations of aerodynamic or finite-element analyses can take minutes to hours, if 
not longer. Moreover, this mode of query-and-response often leads to a trial and error approach to 
design whereby a designer may never uncover the functional relationship between x and y and 
therefore never identify the "best" settings for input values. 
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Statistical techniques are widely used in engineering design to address these concerns. The 
basic approach is to construct approximations of the analysis codes that are more efficient to run 
and yield insight into the functional relationship between x and y. If the true nature of a computer 
analysis code is 

y = f(x), 

then a "model of the model” or metamodel [62] of the analysis code is 

y = g(x), and so y = y + e 

where e represents both the error of approximation and measurement (random) errors. The most 
common metamodeling approach is to apply the design of experiments (DOE) to identify an 
efficient set of computer runs (x,, x 2 , ..., x n ) and then use regression analysis to create a 
polynomial approximation of the computer analysis code. These approximations then can replace 
the existing analysis code while providing: 

• a better understanding of the relationship between x and y , 

• easier integration of domain dependent computer codes, and 

• fast analysis tools for optimization and exploration of the design space by using 
approximations in lieu of the computationally expensive analysis codes themselves. 

We have found that many applications (including our own) using these methods for computer- 
based design are statistically questionable because many analysis codes are deterministic in which 
the error of approximation is not due to random effects. This calls into question the subsequent 
statistical analyses of model significance. Consequently, we seek to highlight potential statistical 
pitfalls in metamodeling and provide general recommendations for the proper use of metamodeling 
techniques in computer-based engineering design. In Section 2 we present a review of 
metamodeling techniques including regression, neural networks, inductive learning, and kriging. 
We conclude Section 2 with an introduction to the general statistical approaches of response 
surface methodology and Taguchi's robust design. In Section 3 we describe the engineering 
design context for statistical applications, review existing applications and methods and conclude 
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with a closer look at deterministic applications of metamodeling. In Section 4 we present some 
recommendations for avoiding pitfalls in using metamodeling, and in Section 5 we conclude by 
discussing some more advanced issues that contribute to making metamodeling an active and 
interesting research area. 


2 REVIEW OF METAMODELING TECHNIQUES 

Metamodeling involves (a) choosing an experimental design for generating data, (b) choosing a 
model to represent the data, and then (c) fitting the model to the observed data. There are several 
options for each of these steps as shown in Figure 1 , and we have attempted to highlight a few of 
the more prevalent ones. For example, building a neural network involves fitting a network of 
neurons by means of backpropagation to data which is typically hand selected while Response 
Surface Methodology usually employs central composite designs, second order polynomials and 
least squares regression analysis. 
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In the remainder of this section we provide a brief overview of several of the options listed in 
Figure 1. In Section 2. 1 the focus is on experimental designs, particularly (fractional) factorial 
designs, central composite designs and orthogonal arrays. In Section 2.2 we discuss model choice 
and model fitting, focusing on response surfaces, neural networks, inductive learning and kriging. 
We conclude with an overview of two of the more common metamodeling techniques, namely, 
response surface methodology and Taguchi's robust design. 

2.1 Experimental Design 

Properly designed experiments are essential for effective computer utilization. In engineering, 
traditionally a single parameter is varied (perturbed) and the effects are observed. Alternatively, 
combinations of factor settings are assigned either systematically (e.g., grid search) or randomly to 
provide an alternative for comparison. Experimental design techniques which were developed for 
physical experiments are being applied to the design of computer experiments to increase the 
efficiency of these analyses. In this section an overview of different types of experiment designs is 
provided along with measures of merit for selecting/comparing different experimental designs. 

2.1.1 A Survey of Experimental Designs. An experimental design represents a sequence of 
experiments to be performed, expressed in terms of factors (design variables) set at specified levels 
(predefined values). An experimental design is represented by a matrix X where the rows denote 
experiment runs, and the columns denote particular factor settings. 

Factorial Designs: The most basic experimental design is a fill factorial design. The number of 
design points dictated by a full factorial design is the product of the number of levels for each 
factor. The most common are 2 k (for evaluating main effects and interactions) and 3 k designs (for 
evaluating main and quadratic effects and interactions) for k factors at 2 and 3 levels, respectively. 
A 2 3 full factorial design is shown in Figure 2(a). 

The size of a full factorial experiment increases exponentially with the number of factors; this 
leads to an unmanageable number of experiments. Fractional factorial designs are used when 
experiments are costly and many factors are required. A fractional factorial design is a fraction of a 
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full factorial design; the most common are 2 (k p) designs in which the fraction is l/2 (p) . A half 
fraction of the 2 3 full factorial design is shown in Figure 2(b). The reduction of the number of 
design points in a fractional factorial design is not without a price. The 2 3 full factorial design 
shown in Figure 2(a) allows estimation of all main effects (x,, x 2 , x 3 ), all two factor interactions 
(XjX 2 , x,x 3 and x^), as well as the three factor interaction (x,x 2 x 3 ). For the 2 3 ' 1 fractional factorial 
indicated by the solid dots in Figure 2(b), the main effects are aliased (or biased) with the two 
factor interactions. Aliased effects cannot be estimated independently unless they are known (or 
assumed) not to exist. 


X2 




(a) 2 3 Full Factorial (b) 2^' Fractional Factorial (c) Composite Design 

Figure 2. Basic Three-Factor Designs 


Often 2 k and 2 (k p> designs are used to identify or screen for important factors. When there are 
many factors, the sparsity of effects principle [87] can be invoked whereby the system is assumed 
to be dominated by main effects and low order interactions. Thus, two level fractional factorial 
designs are used to "screen" factors to identify those with the greatest effects. The sparsity of 
effects principle is not always valid, however; Hunter [59] notes that every design provides aliased 
estimates: quadratic and cubic effects, if present, bias the estimates of the mean and main effects 
when a two level fractional factorial design is used. 

One specific family of fractional factorial designs frequently used for screening are two level 
Plackett-Burman (PB) designs [103]. These are used to study k=n-l factors in n=4m design 
points. PB designs in which n is a power of two are called geometric designs and are identical to 
2 (k p) fractional factorials. If n is strictly a multiple of four, the PB designs are referred to as non- 
geometric designs and have very messy alias structures. Their use in practical problems is 
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problematic particularly if the design is saturated (i.e., the number of factors is exactly n-1). If 
interactions are negligible, however, these designs allow unbiased estimation of all main effects, 
and require only one more design point than the number of factors; they also give the smallest 
possible variance [14]. Myers and Montgomery [91] present a more complete discussion of 
factorial designs and aliasing of effects. Minimum variance and minimum size designs are 
discussed in Section 2.1.2. 

Central Composite and Box-Behnken Designs: To estimate quadratic effects, 3 k or 3 (k p) designs 
can be used but often require an unmanageable number of design points. The most common 
second order designs, configured to reduce the number of design points, are central composite and 
Box-Behnken designs. 

A central composite design (CCD) is a two level (2 (k p) or 2 k ) factorial design, augmented by n 0 
center points and two "star" points positioned at ±a for each factor. This design, shown for three 
factors in Figure 2(c), consists of 2 (kp) +2 k +n 0 total design points to estimate 2k+k(k-l)/2+l 
coefficients. For three factors, setting a=l locates the star points on the centers of the faces of the 
cube, giving a face-centered central composite (CCF) design; note that for values of a other than 1 , 
each factor is evaluated at five levels. 

Often it is desirable to use the smallest number of factor levels in an experimental design. One 
common class of such designs is the Box-Behnken designs [15]. These are formed by combining 
2 k factorials with incomplete block designs. They do not contain points at the vertices of the 
hypercube defined by the upper and lower limits for each factor. This is desirable if these extreme 
points are expensive or impossible to test. More information about CCD and Box-Behnken 
designs can be found in [87]. 

Orthogonal Arrays: The experiment designs used by Taguchi, orthogonal arrays, are usually 
simply fractional factorial designs in two or three levels (2 (kp) and 3 (kp) designs). These arrays are 
constructed to reduce the number of design points necessary; two-level L 4 , L 12 , and L 16 arrays, for 
example, allow 3, 1 1, and 15 factors/effects to be evaluated with 4, 12, and 16 design points, 
respectively. Often these designs are identical to Plackett-Burman designs [78]. The definition of 
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orthogonality for these arrays and other experiment designs is given in Section 2.1.2. An 
overview of Taguchi's approach to parameter design is given in Section 2.3. 

“Space Filling” Designs: Many researchers advocate the use of “space filling” designs when 
sampling deterministic computer experiments. As discussed by Booker [13], in the "classical" 
design and analysis of physical experiments (i.e., using central composite and factorial designs), 
random variation is accounted for by spreading the sample points out in the design space and by 
taking multiple data points (replicates), see Figure 3. Sacks, et al. [1 17] state that the “classical” 
notions of experimental blocking, replication, and randomization are irrelevant when it comes to 
deterministic computer experiments; thus, sample points in DACE should be chosen to fill the 
design space. They suggest minimizing the integrated mean squared error (IMSE) over the design 
region by using IMSE-optimal designs; the “space filling” design illustrated in Figure 3(b) is an 
IMSE optimal design. 
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(a) “Classical” Design (b) “Space Filling” Design 

Figure 3. “Classical” and “Space Filling” Designs [adapted from 13] 


Koch [63] investigates the use of a modified central composite design which combines half 
fractions of a CCI and a CCF to more evenly distribute the points throughout the design space. 
Koehler and Owen [66] describe several Bayesian and Frequentist “space filling” designs, 
including maximum entropy designs, mean squared-error designs, minimax and maximin designs, 
Latin hypercubes, randomized orthogonal arrays, and scrambled nets. Minimax and maximin 
designs were originally proposed by Johnson, et al. [60] specifically for use with computer 
experiments. Sherwy and Wynn [121] and Currin, et al. [35] use the maximum entropy principle 
to develop designs for computer experiments. Tang [ 128] describes orthogonal array-based Latin 
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hypercubes which he asserts are more suitable for computer experiments than general Latin 
hypercubes. Park [100] discusses optimal Latin hypercube designs for computer experiments 
which either minimize IMSE or maximize entropy, spreading the points out over the design region. 
Morris and Mitchell [89] propose maximin distance designs found within the class of Latin 
hypercube arrangements since they “offer a compromise between the entropy/maximin criterion, 
and good projective properties in each dimension.” Owen [99] advocates the use of orthogonal 
arrays as suitable designs for computer experiments, numerical integration, and visualization; a 
collection of orthogonal array generators is available over the Internet [98]. A review of Bayesian 
experimental designs for linear and nonlinear regression models is given in [24]. 

2.1.2 Measures of Merit for Evaluating Experimental Designs. Selecting the appropriate 
design is essential for effective experimentation: the desire to gain as much information as possible 
about the response-factor relationships is balanced against the cost of experimentation. Several 
measures of merit are available and useful for evaluating and comparing experimental designs. 

Orthogonality, Rotatability, Minimum Variance, and Minimum Bias: To facilitate efficient 
estimates of parameters, four desirable characteristics of an experimental design are orthogonality, 
rotatability, minimum variance, and minimum bias. A design is orthogonal if, for every pair of 
factors X; and x j; the sum of the cross-products of the N design points 

N 

XiuXju 

u= 1 

is zero. For a first order model, the estimates of all coefficients will have minimum variance if the 
design can be configured so that 

N 

2 ) 4 = N ; 

U= 1 

the variance of predictions y will also have constant variance at a fixed distance from the center of 
the design, and the design will also be rotatable. 
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In second order modeling, Hunter [58] suggests that orthogonality is less important: "If the 
objective of the experimenter is to forecast a response at either present or future settings of x, then 
an unbiased minimum variance estimate of the forecast y is required. In the late 1950's Box and 
his co-workers demonstrated that rotatability... and the minimization of bias from higher order 
terms. ..were the essential criteria for good forecasting." A design is rotatable if N*Var[y (x)]/ct 2 
has the same value at any two locations that are the same distance from the design center. The 
requirements for minimum variance and minimum bias designs for second order models are 
beyond the scope of this work; we refer the reader to [91] for more information. 

Unsaturated/Saturated and Supersaturated Designs: In many cases, the primary concern in the 
design of an experiment is its size. Most designs are unsaturated in that they contain at least two 
more design points than the number of factors. A saturated design is one in which the number of 
design points is equal to one more than the number of factor effects to be estimated. Saturated 
fractional factorial designs allow unbiased estimation of all main effects with the smallest possible 
variance and size [14]. The most common examples of saturated designs are the Plackett-Burman 
two level design and Taguchi’s orthogonal arrays. For estimating second order effects, small 
composite designs have been developed to reduce the number of required design points. A small 
composite design is saturated if the number of design points is 2k+k(k-l)/2+l (the number of 
coefficients to be estimated for a full quadratic model). Myers and Montgomery [91] note that 
recent work has suggested that these designs may not always be good; additional comments on 
small composite designs can be found in [16, 77]. Finally, in supersaturated designs the number 
of design points is less than or equal to the number of factors [37, 38J. 

It is most desirable to use unsaturated designs for predictive models, unless running the 
necessary experiments is prohibitively expensive. When comparing experiments based on the 
number of design points and the information obtained, the D-optimal and D-efficiency statistics are 
often used. 

D-optimal and D-efficiency: A design is said to be D-optimal if |X' X|/n p is maximized where X 
is the expanded design matrix which has n rows (one for each design setting) and p columns (one 
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column for each coefficient to be estimated plus one column for the overall mean). The D- 
efficiency statistic for comparing designs, Eq. (1), compares a design against a D-optimal design, 
normalized by the size of the matrix in order to compare designs of different sizes. 

D-efficiency = (|X’XL„„/|X’X| D ^ lmlm ) 1 " (1) 

Other statistics for comparing designs such as G-efficiency, Q-efficiency, and A-optimality have 
also been formulated, see, e.g., [91]. We now turn to the issues of model choice and model 
fitting. 

2.2 Model Choice and Model Fitting 

After selecting an appropriate experimental design and performing the necessary computer 
runs, the next step is to choose an approximating model and fitting method. Many alternative 
models and methods exist, but here we review the four which are most prevalent in the literature: 
response surfaces, neural networks, inductive learning, and kriging. 

2.2.1 Response Surfaces. Given a response, y, and a vector of independent factors x 
influencing y, the relationship between y and x is: 

y = f(x) + e , (2) 

where e represents random error which is assumed to be normally distributed with mean zero and 
standard deviation a. Since the true response surface function f(x) is usually unknown, a response 
surface g(x) is created to approximate f(x). Predicted values are then obtained using y = g(x). 

The most widely used response surface approximating functions are low-order polynomials. 
For low curvature, a first order polynomial can be used as in Eq. (3); for significant curvature, a 
second order polynomial which includes all two-factor interactions is available, see Eq. (4). 

k 

y = Po + X PiXi (3) 

i=l 

y = Po + X Pi x i + X Mi 2 + X X P*i x i x i (4) 

i=l i=l i j 

i<j 
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The parameters of the polynomials in Eq. (3) and Eq. (4) are usually determined by least 
squares regression analysis by fitting the response surface approximations to existing data. These 
approximations are normally used for prediction wit hin response surface methodology (RSM). A 
more complete discussion of response surfaces and least squares fitting is presented in [91]. An 
overview of RSM is given in Section 2.3. 

2.2.2 Neural Networks. A neural network is composed of neurons (single-unit perceptrons) 
which are multiple linear regression models with a nonlinear (typically sigmoidal) transformation 
ony. If the inputs to each neuron are denoted {Xj, x 2 , ..., x n }, and the regression coefficients are 
denoted by the weights, W;, then the output, y, might be given by 

y = — - — 

1 + e' n/r (5) 

where n = Lw i x i + (3 (where [3 is the "bias value" of a neuron), and T is the slope parameter of the 
sigmoid defined by the user. A neural network is then created by assembling the neurons into an 
architecture; the most common of which is the multi-layer feedforward architecture, see Figure 4. 



(a) Single-Unit Perceptron (b) Feedforward Two-layer Architecture 

Figure 4. Typical Neuron and Architecture 

There are two main issues in building a neural network: (1) specifying the architecture and (2) 
training the neural network to perform well with reference to a training set. "To a statistician, this 
is equivalent to (i) specifying a regression model, and (ii) estimating the parameters of the model 
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given a set of data" [30]. If the architecture is made large enough, a neural network can be a nearly 
universal approximator [115]. Hajela and Berke [54] review the use of neural networks in 
structural analysis and design. 

"Training” a neural network is the determination of the proper values for all weights, W;, in the 
architecture and is usually done by backpropagation [115]; this requires a set of n training data 
points {(Xj,y ,), (x 2 ,y 2 ), ..., (x p ,y p )}. For a network with output y, the performance is 

E =Z(y P -yp) 2 (6) 

p 

where y is the output that results from the network given input x p , and E is the total error of the 
system. The weights are then adjusted in proportion to 

dE dy (7) 

dy 3wy 

Neural networks are best suited for approximating deterministic functions in regression-type 
applications. "In most applications of neural networks that generate regression- like output, there is 
no explicit mention of randomness. Instead, the aim is function approximation” [30]. Typical 
applications of neural nets are speech recognition and handwritten character recognition where the 
data is complex and of high dimensionality. Networks with tens of thousands of parameters have 
been used but the requisite gathering of training data and calculation of model parameters can be 
extremely computationally expensive. Cheng and Titterington [30] comment that "...the procedure 
is to toss the data directly into the NN software, use tens of thousands of parameters in the fit, let 
the workstation run 2-3 weeks grinding away doing the gradient descent, and voila, out comes the 
result. " Rogers and LaMarsh [111] describe parallel computing efforts aimed at reducing the time 
required to “train” neural networks. 

2.2.3 Inductive Learning. Inductive learning is one of five main paradigms of machine learning 
that also include neural networks, case-based learning, genetic algorithms, and analytic learning 
[68]. Of these five, inductive learning is the most akin to regression and metamodeling and is 
therefore the focus here. An inductive learning system induces rules from examples; the 
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fundamental modeling constructs are condition-action rules which partition the data into discrete 
categories and can be combined into decision trees for ease of interpretation, see Figure 5. 



Figure 5. A Decision Tree 


Training data are required in the form {(x 1 ,y 1 ),(x 2 ,y 2 ), (x n ,y n )} where x i is a vector of 
attribute values (e.g., processing parameters and environmental conditions), and each y ; is a 
corresponding observed output value. Although attributes and outputs can be real-valued, the 
method is better suited to discrete-valued data; real values must often be transformed into discrete 
representations [41]. Once the data has been collected, training algorithms build a decision tree by 
selecting the "best" divisive attribute and then recursively calling the resulting data subsets. 
Although trees can be built by selecting attributes randomly, it is more efficient to select attributes 
that minimize the amount of information needed for category membership. The mathematics of 
such an information-theoretic approach are given in [41]. 

Many of the applications of inductive learning have been in process control and diagnostic 
systems, and inductive learning approaches can be used to automate the knowledge-acquisition 
process of building expert systems. Furthermore, although decision trees appear best suited for 
applications with discrete input and output values, there are also applications with continuous 
variables that have met with greater success than standard statistical analysis. Leech [71] reports a 
process-control application where "Standard statistical analysis methods were employed with 
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limited success. Some of the data were non-numerical, the dependencies between variables were 
not well understood, and it was necessary to simultaneously control several characteristics of the 
final product while working within system constraints. The results of the statistical analysis, a set 
of correlations for each output of interest, were difficult for people responsible for the day-to-day 
operation to interpret and use." Additional examples can be found in [41, 68]. 

2.2.4 Kriqinq. Since many computer analysis codes are deterministic and therefore not subject to 
measurement error, the usual measures of uncertainty derived from least-squares residuals have no 
obvious meaning (cf., [117]). Consequently, some statisticians (see, e.g., [13, 66, 117, 138]) 
have suggested modeling responses as a combination of a polynomial model plus departures of the 
form: 

y(x) = f(x) + Z(x) (8) 

where y(x) is the unknown function of interest, f(x) is a known polynomial function of x, and 
Z(x) is the realization of a normally distributed Gaussian random process with mean zero, variance 
a 2 , and non-zero covariance. The f(x) term in Eq. (8) is similar to the polynomial model in a 
response surface and provides a "global" model of the design space; in many cases f(x) is simply 
taken to be a constant term, see, e.g., [ 1 17, 139]. 

While f(x) "globally" approximates the design space, Z(x) creates "localized" deviations so 
that the kriging model interpolates the n s sampled data points. The covariance matrix of Z(x) is 
given by: 

Cov[Z(x'),Z(x j )] = a 1 RtfRCxUO], (9) 

where R is the correlation matrix, and R(x‘,x j ) is the correlation function between any two of the n s 
sampled data points x‘ and x J . R is a (n s x n s ) symmetric matrix with ones along the diagonal. The 
correlation function R(x',x j ) is specified by the user; Sacks, et al. [1 17] and Koehler and Owen 
[66] discuss several correlation functions which may be used. In our work, we have employed a 
Gaussian correlation function of the form: 
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R(x , ,x J )=exp[-^ 1 9 k |x 1 k -x J k |~] (10) 

where 9 k are the unknown correlation parameters used to fit the model, and the x k ' and x k J are the 
k' h components of sample points x 1 and x j . In some cases using a single correlation parameter 
gives sufficiently good results, see, e.g., [94, 1 17]. 

Predicted estimates, y (x), of the response y(x) at untried values of x are given by: 

y = P + r T (x)R _1 (y - f[3) (11) 

where y is the column vector of length n s which contains the values of the response at each sample 
point, and f is a column vector of length n s which is filled with ones when f(x) is taken as a 
constant. In Eq. (11), r T (x) is the correlation vector of length n s between an untried x and the 
sampled data points { x 1 , x 2 , . . . , x ns } and is given by: 

r T (x) = [Rfx^ 1 ), R(x,x 2 ), ..., R(x,x ns )] T (12) 


In Eq. (11), p is estimated using Eq. (13). 


P = (f T R _1 f)~ l f T R _1 y 


The estimate of the variance, a 2 , from the underlying global model (not the variance in the 
observed data) is given by 


-2 (y-fp) T R- L (y-tP) 


where f(x) is assumed to be the constant p. The maximum likelihood estimates (i.e., "best 
guesses") for the 9 k in Eq. (10) used to fit the model are found by maximizing [13]: 


[n s ln(a 2 ) + ln|R|J 
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for (^ > 0 where both a 2 and |R| are both functions of 0^ While any values for the 9,^ create an 
interpolative approximation model, the “best” kriging model is found by solving the k-dimensional 
unconstrained nonlinear optimization problem given by Eq. (15). 

Depending on the choice of correlation function in Eq. (10), kriging can either "honor the 
data," providing an exact interpolation of the data, or "smooth the data," providing an inexact 
interpolation [34], Finally, it should be noted that kriging is different from fitting splines (i.e., 
non- parametric regression models). In several comparative studies kriging performs as well as, if 
not better than, splines [69]. 

2.2.4 Additional Metamodelinq Approaches. For the reader’s convenience, we include 
references for some alternative metamodeling techniques which have not been discussed in the 
previous subsections. Rasmussen [106] offers an accumulated approximation technique for 
structural optimization which refines the approximation of objective and constraint functions by 
accumulating the function values of previously visited points. Similarly, Balling and Clark [4] 
describe weighted and gradient-based approximations for use with optimization which utilize 
weighted sums of exact function values at sample points. Friedman [45] describes multivariate 
adaptive regression splines (MARS): a flexible regression modeling method based on recursive 
partitioning and spline fitting for high dimensional data. Dyn, et al. [39] use radial basis functions 
to build global approximation surfaces to interpolate smooth data. Wang, et al. [137] present 
multivariate Hermite approximations for multidisciplinary design optimization which uses data 
generated during the course of iterative optimization; it is compared against linear, reciprocal, and 
other standard approximations but shows inefficiencies because it requires more data points. 
Finally, Friedman and Steutzle [46] introduce projection pursuit regression which works well in 
high-dimensional (< 50) data and with large data sets (can handle 200,000+ data points); project 
pursuit regression takes the data and generates different projections of it along linear combinations 
of the variables; an optimizer finds the best projections and builds a predictor by summing them 
together with arbitrary levels of precision. 
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This concludes our discussion on experimental design, model selection and model fitting. We 
now turn to more general methods for experimental design and modeling building. 

2.3 Experimentation and Metamodeling Strategies 

Two widely used methods incorporating experimental design, model building, and prediction 
are response surface methodology and Taguchi's robust design or parameter design. A brief 
overview of these two approaches is provided. 

2.3.1 Response Surface Methodology (RSM). Different authors describe RSM differently. 
Myers, et al. [90] define RSM as "a collection of tools in design or data analysis that enhance the 
exploration of a region of design variables in one or more responses." Box and Draper [16] state 
that, "Response surface methodology comprises a group of statistical techniques for empirical 
model building and model exploitation. By careful design and analysis of experiments, it seeks to 
relate a response, or output variable, to the levels of a number of predictors, or input variables, that 
affect it." Finally, Myers and Montgomery [91] state that RSM “is a collection of statistical and 
mathematical techniques useful for developing, improving, and optimizing process. It also has 
important applications in the design, development, and formulation of new products, as well as in 
the improvement of existing product designs.” 

The “collection of statistical and mathematical techniques” of which these authors speak refers 
to the design of experiments (Section 2.1), least squares regression analysis and response surface 
model building (Section 2.2.1), and "model exploitation," exploring a factor space seeking 
optimum factor settings. The general RSM approach includes all or some of the following steps: 

i) screening: when the number of factors is large or when experimentation is expensive, 
screening experiments are used to reduce the set of factors to those that are most influential to 
the response(s) being investigated; 

ii) first order experimentation: when the starting point is far from the optimum point or when 
knowledge about the space being investigated is sought, first order models and an approach 
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such as steepest ascent are used to "rapidly and economically move to the vicinity of the 
optimum" [88]; 

iii) second order experimentation: after the best solution using first order methods is obtained, a 
second order model is fit in the region of the first order solution to evaluate curvature effects 
and to attempt to improve the solution. 

A more detailed description of RSM techniques and tools can be found in [91], and a 
comprehensive review of RSM developments and applications from 1966-1988 is given in [90]. 
In Section 3 we review recent applications in aerospace and mechanical engineering design, but 
first we discuss Taguchi’s robust design approach. 

2.3.2 Taquchi's Robust Design. Genichi Taguchi developed an approach for industrial 
product design built on statistically designed experiments. Taguchi’s robust design for quality 
engineering includes three steps: system design, parameter design, and tolerance design [18]. The 
key step is parameter design within which statistical experimentation is incorporated. 

Rather than simply improving or optimizing a response value, the focus in parameter design is 
to identify factor settings that minimize variation in performance and adjust the mean performance 
to a desired target in order to minimize the associated loss. Factors included in experimentation 
include control factors and noise factors', control factors are set and held at specific values, while 
noise factors cannot be controlled, e.g., shop floor temperature. The evaluation of mean 
performance and performance variation is accomplished by "crossing" two orthogonal arrays 
(Section 2.1.1). Control factors are varied according to an inner array, or "control", array, and for 
each run of the control array, noise factors are varied according to an outer, or "noise", array. For 
each control factor experiment, a response value is obtained for each noise factor design point. 
The mean and variance of the response (measured across the noise design points) are calculated. 
The performance characteristic used by Taguchi is a signal-to-noise (S/N) ratio defined in terms of 
the mean and variance of the response. Several alternate S/N ratios are available based on whether 
lower, higher, or nominal response values are desired, see, e.g., [112]. 
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The Taguchi approach does not explicitly include model building and optimization. Analysis of 
experimental results is used to identify factor effects, to plan additional experiments, and to set 
factor values for improved performance. A comprehensive discussion of the Taguchi approach is 
given in [102, 1 12]. Taguchi methods have been used extensively in engineering design and are 
often incorporated within traditional RSM for efficient, effective, and robust design [91]. These 
applications and their implications for engineering design are discussed next. 

3 METAMODELING IN ENGINEERING DESIGN 

How are the metamodeling techniques of the previous section employed in engineering design? 
All of these techniques can be used to create approximations of existing computer analyses, and 
produce fast analysis modules for more efficient computation. These metamodeling techniques 
also yield insight into the functional relationship between input and output parameters. 

Where would such models be useful? A designer's goal is usually to arrive at improved or 
robust solutions which are the values of design variables that best meet the design objectives as 
shown in Figure 6. A search for these solutions usually relies on an optimization technique which 
generates and evaluates many potential solutions in the path toward design improvement; thus, fast 
analysis modules are an imperative. 

When are metamodels useful or appropriate? In the later stages of design when detailed 
information about specific solutions is available, highly accurate analysis is essential. In the early 
stages of design, however, the focus is on generating, evaluating, and comparing potential 
conceptual configurations. The early stages of design are characterized by a large amount of 
information, often uncertain, which must be managed. To ensure the identification of a "good" 
system configuration, a comprehensive search is necessary. In this case, the tradeoff between 
accuracy and efficiency may be appropriate. The creation of metamodels allows fast analysis, 
facilitating both comprehensive and efficient design space search at the expense of a (hopefully 
slight) loss of accuracy. 
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Screening 


(improved or robust) 


Figure 6. Principal Use of Statistics in Computer-Based Design 


Having established our engineering design “context” for using metamodels, we present a 
review of several statistical applications in engineering design in Section 3.1. In Section 3.2 we 
discuss a general statistical methods which have been developed for engineering applications, and 
we conclude by discussing some of the pitfalls associated with the application of statistical 
techniques to deterministic computer experiments in Section 3.3. This then paves the way for 
Section 4, guidelines for the appropriate use of statistics in computer-based design. 

3.1 Applications in Engineering Design 

3.1.1 DOE. RSM. and Taquchi’s Robust Design. In Table 1 we present a survey of several 
engineering applications of design of experiments, response surface methodology, and Taguchi's 
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robust design approach. Most of these examples come from aerospace and mechanical engineering 

design applications presented at conferences in recent years. A review of approximation concepts 

used in structural design can be found in [7]. 

Some observations regarding our findings are as follows. 

• Central composite designs and D-optimal designs seem to be preferred among aerospace 
engineers while orthogonal arrays (OAs) are preferred by mechanical engineers; grid and random 
point searches are seldom used since they are less efficient. 

• Optimization seems to be the principal driver for aerospace applications of DOE and RSM; these 
types of applications typically involve the use of computer intensive analysis and optimization 
routines, and DOE and RSM is a logical choice for increased efficiency. 

• Mechanical engineers usually use OAs and Taguchi's approach for robust design and the signal- 
to-noise ratio for parameter and tolerance design. 

• Very few designers actually model Taguchi's loss function directly (see, e.g., [9]); many prefer 
to model the response instead. 

• Most applications use second order response surface models; there are only a few cases where 
higher order (see, e.g., [136]) and mixed polynomial models (see, e.g., [113]) are used for 
engineering design. 

• When orthogonal arrays are used, special care must be taken to avoid aliasing main effects with 
interactions, unless the interactions are known (or assumed) to be insignificant. 

• Most applications utilize least squares regression analysis when fitting a model; only a few use 
stepwise regression, and this is usually because the model is not second order. 
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3.1.1 Kriqinq Applications. Kriging, also referred to as DACE (Design and Analysis of 
Computer Experiments after the inaugural paper [117]) modeling, has found limited use in 
engineering design applications perhaps because of the lack of readily available software to fit 
kriging models, the added complexity of fitting a kriging model, or the additional effort required to 
use a kriging model. Simpson, et al. [124] detail a preliminary comparison of second order 
response surface models and kriging models for the multidisciplinary design of an aerospike nozzle 
which has three geometry (design) variables; neither the kriging models nor the response surface 
models consistently outperform the other in this engineering example. Giunta [50] presents an 
investigation into the use of kriging for the multidisciplinary design optimization of a High Speed 
Civil Transport aircraft. He explores a five and a ten variable design problem, observing that the 
kriging and response surface modeling approaches yield similar results due to the quadratic trend 
of the responses. Osio and Amon [94] have developed an extension of DACE modeling for 
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numerical optimization which uses a multistage strategy for refining the accuracy of the model; they 
have applied their approach to the thermal design of an embedded electronic package which has 5 
design variables. Booker, et al. [12] solve a 31 variable helicopter rotor structural design problem 
using an similar approximation methodology based on kriging. Booker [11] extend the helicopter 
rotor design problem to include 56 structural variables to examine the aeroelastic and dynamic 
response of the rotor. Welch, et al. [138] describe a kriging-based approximation methodology 
which they use to identify important variables, detect curvature and interactions, and produce a 
useful approximation model for two 20 variable problems using only 30-50 runs of the computer 
code; they claim their method can cope with up to 30-40 variables provided factor sparsity can be 
exploited. Trosset and Torczon [129] have developed a numerical optimization strategy which 
incorporates DACE modeling and pattern search methods for global optimization. Cox and John 
[32] have developed the Sequential Design for Optimization method which uses lower confidence 
bounds on predicted values of the response for the sequential selection of evaluation points during 
optimization. Both approaches have shown improvements over traditional optimization approaches 
when applied to a variety of standard mathematical test problems. 

3.2 Existing Methods and Tools in Engineering Design 

In this section we present some methods and tools developed specifically for engineering 
which incorporate statistical techniques from Section 2. Since the Taguchi approach and RSM 
have been widely applied in engineering design, a literature review comparing these approaches is 
given first. This is followed by an overview of some methods and “tools” that have been 
developed for general design applications. These include the Robust Concept Exploration Method, 
the Variable-Complexity Response Surface Modeling Method, and Concurrent SubSpace 
Optimization to name a few. 

3.2.1 Taquchi Approach vs. RSM. The Taguchi approach and RSM have been applied 
extensively in engineering design. It is commonly accepted that the principles associated with the 
Taguchi approach are both useful and very appropriate for industrial product design. Ramberg, et 
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al. [105] suggest that "the loss function and the associated robust design philosophy provide fresh 
insight into the process of optimizing or improving the simulation's performance." Two aspects of 
the Taguchi approach are often criticized: the choice of experimental design (orthogonal arrays, 
inner and outer) and the loss function (signal-to-noise ratio). It has been argued and demonstrated 
that the use of a single experiment combining control and noise factors is more efficient [122, 131, 
139]. The drawbacks of combining response mean and variance into a single loss function (signal- 
to-noise ratio) are well-documented. Many authors advocate measuring the response directly and 
separately tracking mean and variance (cf., [27, 105, 139]). However, Shoemaker, et al. [122] 
warn that a "potential drawback of the response-model approach is that it depends more critically 
than the loss-model approach on how well the model fits." 

Given the wide acceptance of Taguchi robust design principles and the criticisms, many 
advocate a combined Taguchi-RSM approach or simply using traditional RSM techniques within 
the Taguchi framework [78, 90, 91, 105]. We believe that orthogonal inner and outer arrays, and 
single composite experiments each have advantages and disadvantages and appropriate uses, and 
that separate observation of mean and variance leads to useful insight. Regardless, the core 
principles of both Taguchi and RSM provide a foundation for many of the specific design methods 
discussed in Section 3.2.2. 

3.2.2 An Overview of Existing Methods. The Robust Concept Exploration Method (RCEM) 
facilitates quick evaluation of different design alternatives and generation of top-level design 
specifications in the early stages of design [25, 26]. Foundational to the RCEM is the integration 
of robust design principles, DOE, RSM, and the compromise Decision Support Problem (a 
multiobjective decision model). The RCEM has been applied to the multiobjective design of a 
High Speed Civil Transport [25, 65], a family of General Aviation Aircraft [123], a turbine lift 
engine [64], a solar-powered irrigation system [28], and a flywheel [70]; to manufacturing 
simulation [101]; and to maintainability design of aircraft engines [93]. A preliminary investigation 
into the use of DOE and neural networks to augment the capabilities of response surface modeling 
within the RCEM is given in [29]. 
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The Variable-Complexity Response Surface Modeling (VCRSM) Method uses analyses of 
varying fidelity to reduce the design space to the region of interest and build response surface 
models of increasing accuracy (see, e.g., [51, 52]). The VCRSM method employs DOE and RS 
modeling techniques and has been successfully applied to the multidisciplinary wing design of a 
high speed civil transport (see, e.g., [2, 3, 51, 53, 61]), to the analysis and design of composite 
curved channel frames [80], to the structural design of bar trusses [113], to predict the fatigue life 
of structures [135], to reduce numerical noise inherent in structural analyses [53, 136] and shape 
design problems using fluid flow analysis [92], and to facilitate the integration of local and global 
analyses for structural optimization [48, 104, 134]. Coarse-grained parallelization of analysis 
codes for efficient response surface generation has also been investigated [17, 61]. 

Concurrent SubSpace Optimization (CSSO) uses data generated during concurrent subspace 
optimizations to develop response surface approximations of the design space. Optimization of 
these response surfaces forms the basis for the subspace coordination procedure. The data 
generated by the subspace optimizers is not uniformly centered about the current design as in CCD 
or other sampling strategies, but instead follows the descent path of the subspace optimizers. In 
[109, 108, 1 10], interpolating polynomial response surfaces are constructed which have either a 
first or second order basis for use in the CSSO coordination procedure. In [140, 141], a modified 
decomposition strategy is used to develop quadratic response surfaces for use in the CSSO 
coordination procedure. Finally, in [118-120] artificial neural network response surfaces are 
developed for use in the CSSO coordination procedure. 

Robust Design Simulation (RDS) is a stochastic approach which employs the principles of 
Integrated Product and Process Development (IPPD) for the purpose of determining the optimum 
values of design factors and proposed technologies (in the presence of uncertainty) which yield 
affordable designs with low variability. Toward this end, RDS combines design of experiments 
and response surface metamodels with Monte Carlo simulation and Fast Probability Techniques 
(see, e.g., [6]) to achieve customer satisfaction through robust systems design [83]. RDS has 
been applied to the design of a High Speed Civil Transport aircraft [81, 83] and very large 
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transports [82]. RDS has also been used to study the economic uncertainty of the HSCT [36, 84] 
and the feasibility /viability of aircraft [85]. 

NORMAN/DEBORA is a TCAD (Technology Computer Aided Design) system incorporating 
advanced sequential DOE and RSM techniques to aid in engineering optimization and robust design 
[22]. NORMAN/DEBORA includes a novel design of experiments concept — Target Oriented 
Design — a unique parameter transformation technique — RATIOFIND — and a non-linear, 
constrained optimizer — DEBORA [23]. It has been successfully employed for semiconductor 
integrated circuit design and optimization [20, 21, 23, 57]. An updated and more powerful version 
of NORMAN/DEBORA is being offered as LMS Optimus [75]. 

The Probabilistic Design System (PDS) being developed at Pratt and Whitney uses Box- 
Behnken designs and response surface methodology to perform probabilistic design analysis of 
gas turbine rotors [43, 42]. Fox [44] describes twelve criteria which are used to validate the 
response surfaces which are used in combination with cheap-to-run analyses in a Monte Carlo 
Simulator to estimate the corresponding distributions of the responses and minimum life of system 
components. Adamson [1] describes issues involved with developing, calibrating, using, and 
testing the PDS and discusses Pratt and Whitney’s plans to validate the PDS by designing, 
building, and testing actual parts. 

DOE/Opt is a prototype computer system for DOE, RSM, and optimization [10]. It has been 
used in semiconductor process/device design including process/device optimization, simulator 
tuning, process control recipe generation, and design for manufacturability. 

Hierarchical and Interactive Decision Refinement (HIDER) is a methodology for concept 
exploration in the early stages of design. It integrates simulation, optimization, statistical 
techniques, and machine learning to support design decision making [107, 142]. The 
methodology is used to hierarchically refine/reduce “a large initial design space through a series of 
multiple-objective optimizations, until a fully specified design is obtained” [142]. HIDER uses the 
Adaptive Interactive Modeling System (AIMS) [76] to decompose the design space using distance- 
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based, population-based, and hyperplane-based algorithms. HIDER and AIMS have been applied 
to the design of a cutting process [76], a diesel engine [142, 143], and a wheel loader [107]. 

Other approaches incorporating statistical techniques in engineering design exist; only a few 
have been included here. Our focus is not on the methods, but on the appropriateness of the 
statistical techniques; many of the examples to which these methods have been applied employ 
deterministic computer experiments in which the application of statistical techniques is 
questionable. Associated issues are discussed in the next section. 

3.3 A Closer Look at Experimental Design for Deterministic Computer Experiments 

Since engineering design usually involves exercising deterministic computer analysis codes, 
the use of statistical techniques for creating metamodels warrants a closer look. Given a response 
of interest, y, and a vector of independent factors x thought to influence y, the relationship 
between y and x (see Eq. (2)) includes the random error term e. To apply least squares regression, 
error values at each data point are assumed to have identical and independent normal distributions 
with means of zero and standard deviations of ct, or E; i.i.d. NfOjtr 2 ), see Figure 7(a). The least 
squares estimator then minimizes the sum of the squared differences between actual data points and 
predicted values. This is acceptable when no data point actually lies on the predicted model 
because it is assumed that the model "smoothes out" random error. Of course, it is likely that the 
regression model itself is merely an approximation of the true behavior of x and y so that the final 
relationship is 

y = g(x) + £bias + e random (16) 

where £ bias represents the error of approximation. However, for deterministic computer analyses as 
shown in Figure 7(b), e random has mean zero and variance zero, yielding the relationship 

y = g(x) + e bias . (17) 
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(b) Deterministic Case 

Figure 7. Deterministic and Non-Deterministic Curve Fitting 


The deterministic case in Eq. (17) conflicts sharply with the methods of least squares 
regression. Unless e bias is i.i.d. N(0,cr 2 ) the assumptions for statistical inference from least squares 
regression are violated. Further, since there is no random error it is not justifiable to smooth 
across data points; instead the model should hit each point exactly and interpolate between them as 
in Figure 5(b). Finally, most standard tests for model and parameter significance are based on 
computations of e random and therefore cannot be computed. These observations are supported by 
literature in the statistics community; as Sacks, et al. [117] carefully point out that because 
deterministic computer experiments lack random error: 

• response surface model adequacy is determined solely by systematic bias, 

• the usual measures of uncertainty derived from least-squares residuals have no obvious 
statistical meaning (deterministic measures of uncertainty exist, e.g., max. |y (x) - y(x)| over 
x, but they may be very difficult to compute), and 

• the classical notions of experimental blocking, replication and randomization are irrelevant. 
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Furthermore, some of the methods for the design and analysis of physical experiments (see, 
e.g., [14, 16, 91] are not ideal for complex, deterministic computer models. "In the presence of 
systematic error rather than random error, statistical testing is inappropriate" [139]. A discussion 
of how the model should interpolate the observations can be found in [1 16]. 

So where can these methods go wrong? Unfortunately it is easy to misclassify the e bias term 
from a deterministic model as e random and then proceed with standard statistical testing. Several 
authors have reported statistical measures (e.g., F-statistics and root mean square error) to verify 
model adequacy, see, e.g., [55, 64, 132, 136, 139]. However, these measures have no statistical 
meaning since they assume the observations include a random error term with a mean of zero and a 
non-zero standard deviation. Consequently, the use of stepwise regression for polynomial model 
fitting is also inappropriate since it utilizes F-statistics when adding/removing model parameters. 

Some researchers (see, e.g., [51, 53, 92, 135, 136]) have used metamodeling techniques for 
deterministic computer experiments containing numerical noise. Metamodels are used to smooth 
the numerical noise which inhibits the performance of gradient based optimizers (cf., [53, 4]). 
When constructing the metamodels, the numerical noise is used as a surrogate for random error, 
and the standard least-squares approach is then used to determine model significance. The idea of 
equating numerical noise to random error warrants further investigation into the sources and nature 
of this ‘‘deterministic’’ noise. 

How can model accuracy he tested ? R-Squared (the model sum of squares divided by the total 
sum of squares) and R-Squared_adjusted (which takes into account the number of parameters in 
the model) are the only measures for verifying model adequacy in detenninistic computer 
experiments. This measure is often insufficient; a high R-Squared value can be deceiving. 
Residual plots may be helpful for verifying model adequacy, identifying trends in data, examining 
outliers, etc; however, validating the model using additional ( different ) data points is essential. 
Maximum absolute error, average absolute error, and root mean square error for the additional 
validation points can be calculated to assess model accuracy, see, e.g., [124, 136]. Otto, et al. 
[95, 96] and Yesilyurt and Patera [145] have developed a Bayesian-validated surrogate approach 
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which uses additional validation points to make qualitative assessments of the quality of the 
approximation model and provide theoretical bounds on the largest discrepancy between the model 
and the actual computer analysis. They have applied their approach to optimization of multi- 
element airfoils [96], design of trapezoidal ducts and axisymmetric bodies [97], and optimization 
of an eddy-promoter heat exchanger [144, 145]. Finally, an alternative method which does not 
require additional points is leave-one-out cross validation [86]. Each sample point used to fit the 
model is removed one at a time, the model is rebuilt without a sample point, and the difference 
between the model without the sample point and actual value at the sample point is computed for all 
of the sample points. 

Given the potential problems in applying least-squares regression to deterministic applications, 
the trade-off then is between appropriateness and practicality. If a response surface is created to 
model data from a detemiinistic computer analysis code using experimental design and least 
squares fitting, and if it provides good agreement between predicted and actual values, then there is 
no reason to discard it. It should be used, albeit with caution. However, it is important to 
understand the fundamental assumptions of the statistical techniques employed to avoid misleading 
statements about model significance. In the next section we offer some guidelines for the 
appropriate use of statistical metamodeling with deterministic computer analyses. 

4 GUIDELINES AND RECOMMENDATIONS 

How can a designer apply metamodeling tools while avoiding the pitfalls described in Section 

3.3? This can either be answered from the bottom up (tools -> applications, Section 4. 1) or from 
the top down (motives -> tools, Section 4.2). 

4.1 Evaluation of Metamodeling Techniques 

There are two components to this section. The first is an evaluation of the four metamodeling 

techniques described in Section 2.2. The second component is choosing an experimental design 
which has more direct applicability to response surface methods. Determining what experimental 
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designs are most appropriate for the other metamodeling techniques discussed in Section 2.2 are 
open research areas. 


4.1.1 Evaluation of Model Choice and Model Fitting Alternatives. Some guidelines for the 
evaluation of the metamodeling techniques presented in Section 2.2 are summarized in Table 2. 


Table 2. Recommendations for Model Choice and Use 


Model Choice 

Characteristics/Appropriate Uses 

Responses Surfaces 

• well-established and easy to use 

• best suited for applications with random error 

• appropriate for applications with <10 factors 

Neural Networks 

• good for highly nonlinear or very large problems 
(-10,000 parameters) 

• best suited for deterministic applications 

• high computational expense (often > 10,000 
training data points); best for repeated application 

Rule Induction / 
Inductive Learning 

• best when factors and responses are discrete- 
valued 

• form of model is rules or decision tree; better suited 
to diagnosis than engineering design 

Kriging 

• extremely flexible but complex 

• well-suited for deterministic applications 

• can handle applications with < 50 factors 

• limited support is currently available for 
implementation 


Response Surfaces: primarily intended for applications with random error; however, they have 
been used successfully in many engineering design applications. It is the most well-established 
metamodeling technique and is probably the easiest to use, provided the user is aware of the 
possible pitfalls described in Section 3.3. 

Neural Networks: nonlinear regression approach best suited to deterministic applications which 
require repeated use. Building a neural network for a one-shot use can be extremely inefficient due 
to the computational overhead required. 

Inductive Learning: modeling technique most appropriate when input and output factors are 
primarily discrete- valued or can be grouped. The predictive model, in the form of condition-action 
rules or a decision tree, may lack the mathematical insight desired for engineering design. 
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Kriging: an interpolation method capable of handling deterministic data which is extremely 
flexible due to the wide range of correlation functions which may be chosen. However, the 
method is more complex than response surface modeling and lacks readily available computer 
support software. 

4.1.2 Evaluation of Experimental Designs. There are many voices in the discussion of the 
relative merits of different experimental designs, and it is therefore unlikely that we have captured 
them all. The opinions on the appropriate experimental design for computer analyses vary; the 
only consensus reached thus far is that designs for non-random, deterministic computer 
experiments should be “space filling.” Several “space filling” designs were discussed previously 
in Section 2.1.1. For a comparison of some specific design types, we refer the reader to the 
following articles. 

• Myers and Montgomery [91] provide a comprehensive review of experimental designs for fitting 
second order response surfaces. They conclude that hybrid designs are useful, if the unusual 
levels of the design variables can be tolerated; with computer experiments this is unlikely to be a 
problem. 

• Carpenter [19] examines the effect of design selection on response surface performance. He 
compares 2 k and 3 k factorial designs, central composite designs, minimum point designs, and 
minimum point designs augmented by additional randomly selected points; he favors the 
augmented point designs for problems involving more than 6 variables. 

• Giovannitti-Jensen and Myers [49] discuss several first and second order designs, observing that 
the performance of rotatable CCD and Box-Behnken designs are nearly identical. They note that 
"hybrid designs appear to be very promising." 

• Lucas [79] compares CCD, Box-Behnken, uniform shell, Hoke, Pesotchinsky, and Box-Draper 
designs, using the D-efficiency and G-efficiency statistics. 

• Montgomery and Evans [88] compare six second order designs: (a) 3 2 factorial, (b) rotatable 
orthogonal CCD, (c) rotatable uniform precision CCD, (d) rotatable minimum bias CCD, (e) 
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rotatable orthogonal hexagon, and (f) rotatable uniform precision hexagon. Comparison criteria 
include average response achievement and distance from true optimum. 

• Lucas [77] compares symmetric and asymmetric composite and smallest composite designs for 
different numbers of factors using the D-efficiency and G-efficiency statistics. 

4.2 Recommendations for Metamodeling Uses 

Most metamodeling applications are built around creating low order polynomials using central 
composite designs and least squares regression. The popularity of this approach is due, at least in 
part, to the maturity of RSM, its simplicity, and readily accessible software tools. However, RSM 
breaks down when there are many (> 10) factors or highly nonlinear responses. Furthermore, 
there are also dangers in applying RSM blindly in deterministic applications as discussed in Section 
3.3. Alternative approaches to metamodeling (see Section 4.1.1) address some of these 
limitations. Our recommendations are: 

• If many factors must be modeled in a deterministic application, neural networks may be the 
best choice despite their tendency to be computationally expensive to create. 

• If the underlying function to be modeled is deterministic and highly nonlinear in a moderate 
number of factors (less than 50, say), then kriging may be the best choice despite the added 
complexity. 

• In deterministic applications with a few fairly well behaved factors, another option for 
exploration is using the standard RSM approach augmented by a Taguchi outer (noise) array. 

RSM/OA approach: The basic problem in applying least-squares regression to deterministic 
applications is the lack of e random in Eq. (17). However, if some input parameters in the computer 
analysis are classified as noise factors, and if these noise factors are varied across an outer array 
for each setting of the control factors, then essentially a series of replications are generated to 
approximate e random . This is justified if it is reasonable to assume that, were the experiments 
performed on an actual physical system, the random error observed would have been due to these 
noise factor fluctuations. Statistical testing of model and parameter significance can then be 
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performed, and models of both mean response and variability are created from the same set of 
experiments. Further discussion and a preliminary investigation into such an approach is given in 
[72], 

5 SUMMARY AND CLOSING REMARKS 

In this paper we survey some applications of statistics in engineering design and have 
discussed the concept of metamodeling, refer to Section 1 and Figure 6. However, applying these 
techniques to deterministic applications in engineering design can cause problems, see Sections 3. 1 
and 3.3. We present recommendations for applying metamodeling techniques in Section 4, but 
these recommendations are by no means complete. Comprehensive comparisons of these 
techniques must be performed; preliminary and ongoing investigations into the use of kriging as an 
alternative metamodeling technique to response surfaces is described in [124]. 

The difficulties of large problem size and non-linearity are ever-present. In particular, an issue 
of interest to us is the problem of size [65]. As the number of factors in the problem increases, the 
cost associated with creating metamodels begins to out-weigh the gains in efficiency. In addition, 
often screening is insufficient to reduce the problem to a manageable size. This difficulty is 
compounded by the multiple response problem — complex engineering design problems invariably 
include multiple measure of performance (responses) to be modeled. The screening process breaks 
down when attempting to select the most important factors for more than one response since each 
response may require different important factors. The general question arising from these 
problems, then, is how can these experimentation and metamodeling techniques be used efficiently 
for larger problems (problems with greater than 10 factors after screening)? One approach is 
problem partitioning or decomposition. Using these techniques, a complex problem may be 
broken down into smaller problems allowing efficient experimentation and metamodeling, which 
again leads to comprehensive and efficient exploration of a design space [63]. A significant 
literature base exists of techniques for breaking a problem into smaller problems; a good review of 
such methods can be found in [73]. Detailed reviews of multidisciplinary design optimization 
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approaches for formulating and concurrently solving decomposed problems are presented in [125] 
and [33], and a comparison of some of these approaches is given in [5]. 
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